Tool Calls

Dataset: cybench_tools.parquet

In this example we visualize tool usage over a series of turns in a Cybench evaluation. We use a cell() mark to visualize tool use over messages in each sample of an evaluation. We note any limit that ended the sample using a text() mark on the right side of the frame.

Code
from inspect_viz import Data
from inspect_viz.view.beta import tool_calls

tools = Data.from_file("cybench_tools.parquet")

tool_calls(tools)

See the documentation on the tool_calls() function for details on the data it requires as well as customizing varioius aspects of the plot. If you are curious about how the plot was implemented, read on below.

Implementation

Here is an annotated version of the code required to produce the tool call plot above (click on the numbers in the right margin for additional explanation).

Code
from inspect_viz import Data
from inspect_viz.plot import plot, legend
from inspect_viz.mark import cell, text

# read data (see 'Data Preparation' below)
data = Data.from_file("cybench_tools.parquet")

tools = ["bash", "python", "submit"]

plot(
    cell(
        data,
        x="order",
        y="id",
        fill="tool_call_function"
    ),
    
    text(
        data, 
        text="limit", 
        y="id",
        frame_anchor="right", 
        font_size=8, 
        font_weight=200,
        dx=50
    ),
    legend=legend("color", location="right"),
    margin_top=0,
    margin_left=20,
    margin_right=100,
    x_ticks=list(range(0, 400, 80)),
    y_ticks=[],
    x_label="Message",
    y_label="Sample",
    color_label="Tool",
    color_domain=tools
)
1
Read tool call data (see Data Preparation for details).
2
cell() mark showing tool calls.
3
text() mark showing whether the sample terminated due to a limit.
4
Tweak the margins so the axis labels and text annotations appear correctly.
5
Reduce the number of tick marks on the x-axis and eliminate y-ticks.
6
Set some custom labels and ensure that tools follow our designed order.
7
Specify which tools we should show and in what order.

Data Preparation

To create the plot we read a raw messages data frame from an eval log1 then filter down to just the fields we require for visualization:

from inspect_ai.analysis.beta import messages_df, MessageColumns, SampleSummary

# read messages from log
log = "<path-to-log>.eval"
df = messages_df(log, columns=SampleSummary + MessageColumns)

# trim columns
tools_df = df[[
    "eval_id",
    "id",
    "order",
    "tool_call_function",
    "limit"
]]

Note that the trimming of columns is particularly important because Inspect Viz embeds datasets directly in the web pages that host them (so we want to minimize their size for page load performance and bandwidth usage).

Footnotes

  1. The eval log read for this example is in the inspect-viz-example-logs repo↩︎